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DECLARATION OF PAUL POLAKJS, Ph.D. 
I, Paul Polakis, Ph,D., declare and say as follows: ; 

1 . I was awarded a Ph.D by the Department of Biochemistry of the Michigan 
State University in 1984. My scientific Curriculum Vitae is attached to and forms 
part of this Declaration (Exhibit A). 

2. I am currently employed by Genentech, Inc. where my job title is Staff 
Scientist. S ince joining Genentech in 1999, one of my primary responsibilities has 
been leading Genentech's Tumor Antigen Project, which is a large research project 
with a priinary focus on identifying tumor cell markers that find use as targets for 
both the diagnosis and treatment of cancer in humans. 

3. As part of the Tumor Antigen Project, my laboratory has been analyzing 
differential expression of various genes in tumor cells relative to normal cells. 
The purpose of this research is to identify proteins, that are abundantly expressed 
on certain tumor cells and that are either (i) not exijressed, or (ii) expressed at 
lower levels, on corresponding normal cells. We call such differentially expressed 
proteins "tumor antigen proteins", When such a tumor antigen protein is 
identified, one can produce an antibody that recognizes and binds to that protein. 
Such an antibody finds use in the diagnosis of human cancer and may ultimately 
serve as an efifective therapeutic in the treatment Of human cancer. 

4. In the course of the research conducted by Genentech's Tumor Antigen 
Project, we have employed a variety of scientific techniques for detecting and 
studying differential gene expression in human tumor cells relative to normal cells, 
at genomic DNA, mRNA and protein levels. An important example of one such 
technique is the well known and widely used technique of microarray analysis 
wWchhas proven to be extremely useMfer the identification of mRNA moleeiSIes 
that are differentially expressed in one tissue or cell type relative to another. In the 
course of our research using microarray analysis, we have identified 
approximately 200 gene to^scripts that are present in human tumor cells at 
significantly higher levels than in corriesponding normal human cells. To date, \ye 
have generated antibodies tiiat bind to about 30 of the tumor antigen proteins ' 
expressed from these differentially expressed gene transcripte and have used these 
antibodies to quantitatively determine the level of production of tiiese tumor 
antigen proteins in both human cancer cells and corresponding normal cells. We 
have then compared tiie levels of mRNA and protein in botii the tumor and normal 
cells analyzed. 

5. From the mRNA and protein expression analyses described in paragrjq)h 4 
above, we have observed that there is a strong correlation between changes in the 
level of mRNA present in any particular cell type and the level of protein 



expressed from that mRNA in that cell type. In approximately 80% of our 
observations we have found that increases in the level of a particular inRNA 
correlates with changes in the level of protein expressed from that mRNA when 
human tumor cells are compared with tiieir corresponding normal cells. 

6. Based upon my own experience accumulated in more than 20 years of 
research, including the data discussed in paragraphs 4 and 5 above and my 
knowledge of the relevant scientific literature, it is my considered scientific 

. opinion that for human genes, an increased level of riiRNA in a tumor cell relative 
. to a normal cell typically, correlates to a similar increase in abimdance of the 
encoded protein in the tumor cell relative to the normal cell. In fact, it remains a 
central dogma in molecular biology that increased mRNA levels are predictive of . 
corresponding increased levels of the encoded protein. While there have been 
published reports of genes for which such a correlation does not exist, it is my 
ojpihidh that such reports are exceptions to the commonly understood general rule 
that increased mRNA levels are predictive of corresponding iricreased levels of the 
encoded protein. 

7. I hereby declare that all statements made herein of my o wn knowledge are 
true and tiiat all statements made on information or belief are believed to be true, 
arid further that these statements were made with the knowledge that willfiil &lse 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willfUl 
statements may jeopardize the validity of the application or any patent issued 
fliereon. 
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Ph.D., Biochemistry, Department of Biocliemistry, 
Michigan State University (1984) 

B.S., Biology. College of Natural Science, Michigan State University (1977) 



PROFESSIONAL EXPERIENCE: 
2G02-present 



Staff Scientist, Genentech, Inc 
S. San Franclsqb, CA 



1999- 2002 



Senior Scientist, Genentech, Inc., 
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Genome-wide Study of Gene Copy Numbers, 
Transcripts, and Protein Levels in Pairs of 
Non-invasive and Invasive Human Transitional 
Cell Carcinomas* 

Toilsen F. 0mtoft}:§, Thomas ThykjaeiH, Frederic M. Waidman||, Hans Woif*"^, 
and Julio E. Celisit 



Gain and loss of chromosomal material is characteristic 
of bladder cancer, as WeU as malignant transformation In 
general The consequences of these changes at both the 
transcription and translation levels is at pr^^sent unknown 
partly because of technical Rmitafions. H&re we have at- 
tempted to address this question In pairs of non-Invasive 
and invasive human bladder tumors using a comi^nation 
of technology that included comparative genomic hybrid- 
ization, high density oligonucleotide array^based monitor^ 
Ing of transcript levels (5600 genes), and high resolution 



phenomenon at both the transcn'ption and translation levels. 
High throughput array studies of the breast cancer ceil line 
BT474 has suggested that there is a correlation between 
DMA copy numt>ers and gene expression in highly amplified 
areas (2), and studies of Indh^tdual genes In solid tumors 
have revealed a goodx^drrelation between gene dose and 
mRNA or protein levels in the case of c-erb-B2. cyc//n dl, 
emsl, and N-myc (3-5). However, a high cyclln D1 protein 
expression has been observed without simuftaheous am- 



two-dimensional gel electrophoresls/the results showed^^^^^^o" (4)' ^ '^w level of c-myc copy number In- 
that there is a gene dosage effect V>at in some cases crease was observed without concomitant c-mvc oroteln 



superimposes on other regulatory nnechanisms. This ef- . 

icIeperKied (p < 0.(Hj5).m the of the com- 

parative genomic iiyi>ridib^on change, in general (18 of 
23 cases), citromosomal areas with more than 2-fold gain 
of DMA showed a corresponding increase in mRNA tran- 
scripts. Areas with loss of DNA, on the other hand, 
showed either reduced or unaltered transcript levels) Be- 
cause most proteins resolved by two^imensional gels 
are untmown it was only possible to compare mRNA and 
protein alterations In relative^ few cases of well focused 
abundant proteins, ^ith few exceptions we found a good 
correlatipn (p < 0.005) between transcript alterations and 
protein levels. The implications, as well as {imitations, 
of the approach are discussed. Motecular St Cellular 
Proteomlcs 1:^-45, 20(^ 

4f • ■ . ' ■ . 

, Aneuploldy Is a common feature of most human cancers 
(1), but little is known about the genome-wide effect of this 
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crease was observed without concomitant c-myc protein 
overexpression (6)« 

In human bladder tumors, Icaryptypirig. fluorescent /n sttu 
hybridization, and comparative genomic hybrWization (CGH)'' 
have revealed chromosomal aben'atlons that seem to be 
characteristic of certain stages of disease progression. In the 
case of non-invasive pTa transitional cell carcinomas (TCCs), 
this includes loss of chromosome 9 or parts of it, as well as 
loss of Y In males. In minimally Invasive pT1 TCCs, the fol- 
lowing alterations have been reported: 2q~, lip-. 1q+, 
11q13+, 17q+, and 20q4- (7*-12). It has been suggestied that 
these regions harbor tumor suppressor genes and onco- 
genes; hovyever. the large chromosomal areas Involved often 
contain many genes, making meaningful predictions of the 
functional corgsequences of losses arid gains very difficult. 

In this Investigation we4iave combined genome-wide tech- 
nology for detecting genomic gains and losses (CGH) with 
gene expression profiling techniques (microarrays and piro- 
teomlcs) to determine the effect of gene copy numt)er on 
transcript and protein levels in pairs of non-Invasive and in- 
vasive human bladder TCCs. 

EXPERIMENTAL PROCEDURES 

Materfa/— BlacWer tumor biopsies were sannpled after informed 
consent was obtained and after removal of tissue tor routine pathol- 
ogy examination. By light microscopy tumors 335 and 532 were 
staged by an expeiienced pathologist as pTa (superficial papillary), 

^ The abbreviations used are: CGH, comparative genomic hybrid- 
ization; TCC» transitlonai oell carcinoma; LOIH, loss of heterozygosity; 
PA-FABP, psoriasis-associated fatty add-binding protein; 2D, 
two<Ilmensional. 
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Fk3. 1 . DNA copy number and itiRNA expression level. Shown from left to right are chromosome (Cftr.). GGH profiles, gene locaUon and 
expresston level of specific genes, and overall expression level along the chromosome* >\. expression of mRNA In Invasive tumor 733 as 
compared with the non-invasive counterpart tumor 335. 0. expression of mRHA In invasive tumor 827 compared with the non-invasive 
counterpart tumor 532, The average fluorescent signal ratio between tumor DNA and nomial DNA is shown along the length of the chnxnosome 
"(teft). The bold curve \n the ratio profile represents a mean of four chromosomes and Is surrounded by thin cun/BS indicating one standard 
deviation. The central vertical line (joroken) indicates a ratio value of 1 (no change), and the vertlcai lines next to It (dotted) indicate a ratio of 
0.5 (teft) and 2.0 (pghtj. In chromosomes where the non-Invasive tumor 335 used for comparison showed alterationa In DNA content, the ratio 
profile of that chromosome is shown to the right of the Invasive tumor profile. The colored bars represents one gene each, identified by tt>e 
ninnliig rwmbers above the ttars (the name of the gene can be seen at www.MDLDK^sdata.html). The bars indicate the purported location of 
the gene, and the cotors indicate the expression level of the gene In the invasive tumor compared with the non-^nvash/e counterpart; >2-4bld 
Increase (b/acA^, >2-fold decrease (b/ue). no significant change {orange). The bar to the far right, entitled £xpress/on shows the resulting change 
in expression akmg the chn>mosome; the colors indicate that at least half of the genes wer% up^fegulated (biack), at least half of the genes 
downtegulated (blue), or more than half of the genes are unchanged (o/ang^. If a gene was absent intone of the samples and prescffRlif 
another, it was regarded as more than a 2-foid change. A2-fbtd level was chosen as this conresponded to one standard deviation in a doMe 
determination of -^1800 genes. Centronieres and heterochromatic regions were excluded from data analyds. 



grade I and II, respectively, tumors 733 and 827 were staged as pTI 
(jnvasNe into submucosa), 733 was staged as solid, and 827 was 
staged as papillary, both grade III. 

mffAM fVc^sanat/on'-Tlssue biopsies, obtained fre^ 
were embedded immediately in a sodiiom^uanldinlum thiocyanate 
solution and stored at -80 ''C. Total RNA was isolated using the 
RNAzol B RNA isolation method (WAK-Chmie Medical OMB^Q. 
poiy(A)^ RNA was isolated tyy an oligo(dl) selection step plgotex 
mRNA kit; CHagen). 

CRNA Preparation'-^ fxg of mRNA was used as starting material. 
The first and second strand cDNA synthesis was performed using the 
Superscript^ choice system (Invltrogen) acconjing to the manufac- 
turer's Instructions but using an oIigo(dT) primer containing a T7 RNA 
. polymerase binding site. Labeled cRNA was prepared using the ME- 
QAscrip® in vitro transcription Idt (Amblon). Biotln-labeled OTP and 



tJTP (Enzo) was used, togetiier with unlabeled NTPs In the reaction. 
Following the In vitro transcription reaction, the unincorporated nu- 
cleotides were removed using RNeasy columns (Qiagen). 

Anay HytuidBzation and Scannlng^Amy hybridization and scan^ 
ning was modified from a previous method (1 3). 10 fig of cRNA was 
fragmented at 94 for 35 min in buffer containing 40 mM Tris 
acetate. pH 8.1, 100 nriM KOAc. 30 mM MgOAc. Prior to hybridization, 
the fragmented cRNA In a 6x SSPB-T hybridization buffer (1 m NaO, 
10 mM Tris, pH 7.6, 0.005% Triton), was heated to 95 *C for 5 min, 
subsequently cooled to 40 *C. and loaded onto the Affymetrix probe 
anay cartridge. The probe array was then Incubated for 16 h at 40 
at constant rotation (60 rpm). The probe anay was exposed to 10 
washes In 6X SSPE-T at 25 *C followed by 4 washes In 0.5x SSPE-T 
at 50 ^C. The biotlnylated cRNA was. stained with a streptavidin- 
phycoerythrin conjugate, 10 /xg/ml (Molecular Probes) in 6x SSPE-T 
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tor 30 mln at 26 "C followed by 1 0 washes in 6x SSPE-T at 25 "C, The 
probe arrays were scanned at 560 nm using a confocal laser scanning 
nnicroscope (made for Affymetrfx by Hewlett-Packard). The readings 
fipom the quantitative scanning were analyzed by Affymettix gene 
expression analysis software. 

Micro$ate!Ste Ana/>9£s*-Mlcrosat6llite Analysis was perfonned as 
described previously (14). MIcrosateK^ were selected by use of 
wwwjwbCfahi:nih:96v/genenfiap9^ and prlmd" sequisnces were ob- 
tained ton) the genome data base at www,gc&.org. DNA was eoO^ 
front tunMT and blood and ampiaied t>y PC^ in a vokm of 20 |d 
. cycles, Tlie ampllcoris were denatured and electrophoresed for 3 
ABI Prism 377. Data were collected In the Gene Scan program for 
finagment analysis. Loss of heterozygosity was defined as less than 33% 
of one allele detected in tumor amplicons compared with blood* 

Pmteomk: AnafysiS'-TCCs were minced Into small pieces and 
homogertized in a small glass honwgenlzer in 0.5 ml of lysis solution. 
Samples were stored at -20 until use. The procedure for 2D gel 
electrc^horesis has been described in detail elsewhere (15, 16)» Gels 
werB stained with sliver nitrate and/or Coo/tesle Biilllant Blue, Pro- 
teins were Identified by a combination of procedures that included 
miCHDseciuencIng, mass spectrometry, twonlimensionaJ gel Western 
Immunoblottlng. and comparison with the rnaster two<limenslonal gel 
Image of human Iceratlnocyte proteins; see biobase.dk/cgl-bln/cefls. 

CGH— Hybridization of differentially labeled tumor and nonmal DMA 
to nonnat metai:^tase chromosomes was perfonmed as described 
previously Ruorescein-labeled tumor DNA (200 ng), Texas Red- 



labeled reference DNA (200 ng), and human Cot-1 DNA (20 /ig) were 
denatured at 37 °C for 5 min and applied to denatured rvormal met- 
aphase slides. Hybridization was at 37 ""C tor 2 dayS« After washing, 
the slides were counterstatned with 0.15 ftg/ml 4,&-dlamidino-2-phe- 
nyimdote in an anti-fade solution. A second hybridization was per- 
fbnued tor an tumor samples using fiuoiescein-tabeled referenoe DNA 
and Texas Red-labeled tUffnor. DNA inverse labeling) to confirm the 
absrratlpns detected durbig thie initiai 'tiybrldization. Each CGH ex- 
perimerrt also induded a normal control hybridization using fluores- 
oein- and Texas Red-labeled nomnai DNA. Digital image analysis was 
used to identify chromosomat regions abnomial fluorescence 
ratios. Indicating regions of DNA g£dns and losses. The average 
greenired fluorescence intensity ratio profiles wem calculated using 
four ^ages of each chroniosome (eight chromosomes total) with 
normalization of tiie green:r8d fluorescence intensity ratio for the 
entire metaphase and baclcground correctloa Chromosome identifi- 
cation was pertomned based on 4.6-diamldino-2-phenylindole t>and- 
Ing patterns. Only Images showing uniform high intensity fluores- 
cence with minimal tjackground staining were analyzed. All 
centromeres, p arms of acrocentric chromosomes, and heterochro* 
mailc regtons were excluded from the analysis. 

RESULTS 

Comparative Genomic Hybi1dlzation--Jhe CGH analysis 
identified a number of chromosomal gains and losses in tlie 
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Table I 

Oorrefeitfon between aftemUons de^Bcted by CGH and by expression monitoiing 

Top, CGH used as independent variable (if CGH alteration - what expression ratto was found); bottom, altered expr^sion used. 
Independent variable (if expression alteration - what CGH deviation was found). 



CGH alterations 



Tumor 733 vs> 335 
Expression change clustm 
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two Invasive tumors (staige pTI , TCCs 733 and 827), whereas 
the two non-invasive papillomas (stage pTa, TCCs 335 and 
632) showed only 9p-, 9q22-q33-. and and 7+, 9q-, 
and r^pectively. Both inva$lve turnors showed changes 
(1q22-24+, 2q14J-qter-. 3q12-q13.3~. 6q12-q22--, 
9q34-f, 11q12-q13+, 17+, and 20q11.2-q12+) that are typ- 
ical for their disease stage, as welt as additional alt^ations, 
some of which are shown in Fig. 1. Areas with gains and 
losses deviated from the normal copy number to some extent, 
and the average numerical deviation from normal was 0.4-fold 
In the case of TCC 733 and 0.3-fold for TCC 827. The largest 
changes, amounting to at least a doubling of chromosomal 
content were obsen^ed at 1q23 in TCC 733 (Rg. 1^) and 
20q12lnTGC827(Rg. IB). 

mRNA Expression In Relation to DNA Copy Number—The 
mRNAJ^els from the two invasly^Ubirriors^irCCs 827 and 
733) were compared" with the two non-Invasive counterparts 
(TCCs 532 and 33^. This was done In two separate experi- 
ments In which we compared TCCs 733 to 335 and 827 to 
032, respectively, using two dlffensnt scaling settings for the 
arrays to rule out scaling as a confounding parameter. Ap- 
pix^xlmately 1,800 genes that yielded a signal on the an^ays 
were searched In the Unlgene and Genemap data baises for 
chromosomal bcation, and those with a known location 
(1096) were plotted as bars covering their purported locus. In 
that way It was possible to construct a graphic presentation of 
DNA copy number and relative mRNA levels along the indi- 
vidual chromosomes (Fig. 1). 

For each mRNA a ratio was calculated between the level in 
the Invasive versus the non-Invasive counterpart Bars, which 
represent chromosomal location of a gene, were color-coded 
according to the expression ratio, and only differences larger 



than 2-fold were regarded as infonmative (Fig. 1). The density 
of genes along the chromosomes varied, and areas contain- 
ing only one gene were excluded from the calculations. The 
resdutipn of the CGH method is very tow, and some of the 
outlier dal^ may be because of the fact that the boundaries of 
the chromosomal aberrations are not known at high resolution. 
Two sets of calculations were made from the data. For the 
first set we used CGH alterations as the Independent variable 
and estimated the frequency of expression alterations in these 
chromosomal areas. In general, areas with a strong gain of 
chromosomal material contained a cluster of genes having 
increased mRNA expression. For example, both chromo- 
somes 1q21 -<|25, 2p arid 9q, showed b relative gain of more 
than 100% In DNA copy numt)er that was accompanied by 
increased mRNA expression levels in the two tumor pairs (Rg. 
. 1). In most cases, chromos(^nal gains detected by CGH wer^ 
accompanied by an Increased level of transcripts In both 
TCCs 733 (77%) and 827 (80%) (Table I, top). Chromosomal 
losses, on the other hand, were not accompanied by de- 
creased expression In several cases, and were often regis- 
tered as having unaltered RNA levels (Table I, top). The Inabil-" 
ity to detect RNA expression changes in these cases was not 
because of fewer genes mapping to the lost regions (data not 
shown). 

In the second set of calculations we selected expression 
alterations above 2-fold as the independent variable and es* 
timated the frequency of CGH alterations in these areas. As 
above, we found that Increased transcript expression corre- 
lated vWth gain of chromosomal material (TCC 733, 69% and 
TCC 827, 59%), whereas reduced expression was often de- 
tected In areas with unaltered CGH ratios (Table I, bottom). 
Furthermore, as a control we looked at areas with no alter- 
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Fia 2. Correlation between maximum CGH aberration and the ablRty to detect expression change by oligonucleotide annay 
morUtoring. The aberration is shown as a numerical -fold change In ratio between invasive tumors 827 <A) and 733 ) arid their non-Invasive 
counterparts 532 and 335. The expression change was taken from the Expression fine to the right In Rg. 1. which depicts the resulting 
expression change for a given chromosornal region. At least half of the mRNAs from a given region have to be either up- or down-regulated 
to be scored as an expression change. All chromosomal arms In which the CGH ratio plus or minus one standard deviation was outside the 
ratio vdlue of one were included. 



ation In expression. No alteration was detected by CGH In 
most of these areas (TOG 733» 60% and TOG 827. 81%; see 
table I, bottom). Because the ability to observe reduced or 
Increased mRNA expression clustering to a certain chromo- 
somal area clearly reflected the extent of copy numl:>er 
changes, we plotted the maximum CGH aben^atlons in the 
regions showing CGH changes against the ability to detect a 
change in mRI^IA expression as monitored by the oligonucleo- 
tide arrays (Fig. 2)(!B>r both tumors TCC 733 (p < 0.01 5) and 
TCC 827 (p < 0.00003) a highly significant conflation was 
observed between the level of CGH ratio change (reflecting 
the DNA cop^ numbei) and alterations detected by the array 
t>ased technology IFIg. 2^ Similar data were obtained when 
areas with altered expression were used as independent vari- 
ables. These areas conflated best with CGH when the CGH 
ratio deviated 1.6- to 2.0-fold (Table I, bottom) but mostly did 
not at bwer CGH deviations. These data probably reflect that 
loss of an allele may only lead to a 50% reduction In expres- 
sion level, which is at the cut-off point for detection of expres- 
sion alterations. Gain of chromosomal material can occur to a 
much larger extent, 

Micrx>satellfte-t>a$ed Defecton of Minor Areas of Loss- 
es— In TCC 733, several chromosomal areas exhibiting DMA 
amplification were preceded or followed by areas with a nor- 
mal CGH but reduced mRNA expression (see Fig. 1 , TCC 733 
chromosome 1q32, 2p21, and 7q21 and q32, 9q34, and 
10q22), To determine whether these results were because of 
undetected loss of chromosomal matenal in these regions or 



tjecause of other non-structural mechanisms regulating tran- 
scription, we examined two mlcrosatellltes positioned at chro- 
mosome 1q25-32 and two at chromosome 2p22. Loss of 
heterozygosity (LOH) was found at both 1q25 and at 2p22 
indicating that minor deleted areas were not detected with ttie 
resolution of CGH (Fig. 3). Additionally, chromosome 2p in 
TCC 733 showed a CGH pattem of gain/no change/galn of 
DNA that correlated with transcript Increase/decrease/in- 
crease. Thus, for the areas showing increased expression 
there was a congelation with the DNA copy number alterations 
(Fig. 1A). As indicated above, the mRNA decrease observed in 
the middle of the chromosomal ^ain was because of LOH, 
Inipiylng iHlt one of the mechanism^ for mRNA down-regu- 
lation may be regions that have indergone smaller losses of 
chromosomal material. However, this cannot be detected with 
the resolution of the CGH method. 

In both TCC 733 and TCC 827, the telomeric end of chro- 
mosome lip showed a nomfial ra^o in the CGH analysis; 
however, clusters of five and three genes, respectively, lost 
their expriession. Two mlcrosatellltes (D11S1760, D11S922) 
positioned close to MUG2, IGF2, and cathepsin D Indicated 
LOH as the most likely mechanism behind the loss of expres- 
sion (data not shown). 

A reduced expression of mRNA olDserved in TCC 733 at 
chromosomes 3q24» llpll, 12p12.2, 12q21,1. and 16q24 
and in TCC 827 at chromosome 11p15.5, 12p11, 15q11.2, 
and 18q12 was also examined for chromosomal losses using 
microsatellites positioned as close as possible to the gene tool 
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Fig. 3. Microsateliite analysis of loss of heterozygosity. Tumor 
733 showing loss of heterozygosity at chromosome 1q25, detected 
^) t>y D1S2 15 close to Hu class I histocompatibiiity antigen (gene 
number 38 in Rg. 1), (b) by D1S2735 dose to cath^ln E (gene 
number 41 In Rg. t), and (c) at chronoosome 2p23 byJ3^254close 
to general p-spectrin (gene numt>er 11 on Fig. 1) and of (cO tumor 827 
showing loss of heterozygosity at chixmiosome 18q12 by S18S1118 
dose to mitochondrial 3-oxoacyl-co6n2yme A thiolase (^ne number 
12 In Rg. 1). The tipper c(//ves show the electropherogram obtained 
from ndmiat DNA from leukocytes (iV). and the lower curves show the 
electropherogram from tumor DNA (7). In all cases one allele is 
partially lost in the tumor antpllcon. 

showing reduced mRNA transcripts. Only the microsateliite 
positioned at 18q12 showed LOIH (Fig. 3), suggesting that 
transcriptional down-regulation of genes in the other regions 
may be controlled by other mechanisms. 

Relation between Changes in mRNA and Protein Levefe— 
2D-PAGE analysis, in combination with Coomassle Brilliant 
Blue and/or silver staining, was carried out on all four tumors 
using fresh biopsy material. 40 well resolved abundant i<nown 
proteins migrating In areas away from the edges of the pH 
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Ra 4. Oorrelatton Mwe&t protein levels as Judged by 20- 
PAGE and transcript ratio. For comparison proteins were divided In 
three groups, unaltered in level or up- or down-regulated fiorlxorttal 
axis), The mRNA ratio as detemilned by oligonudeotlde anrays was 
plotted for each gene {vertical axis), A. mRNAs that were scored as 
present In both tumors used for the ratio calculation: A, mRNAs that . 
were scored as absent in the invash^e tumors (along horizontal axis) or 
as absent In non-invasive reference {top of figure). Two different 
scafings were used to exclude scaling as a confounder, TCCs 827 
and 532 (A^ were scaled with background suppression, and TCCs 
733 and 335 (•O) were seated without supprassloa Both compari- 
sons showed highly significant (p < 0.005} diifferences in mRMAratibs 
t>etween ttie groups. Proteins shown were as follows: Group A (from 
/efl), phosphoglucomutase 1, glutathione transferase class ft number 
4, fatty add-blndtng protein homologue, cytokeratin 15, and cyto- 
keratin 13; B (from left), fatty acld^blnding protein homologue, 284cDa . 
heat shock protein, cytokeratin 13, and calcycUn; C(from/e^, a-eno- 
lase, hnRNP B1, 28-kDa heat shock proteir), 14-3-3-6, and 
pre-mRNA splicing factor; 0, mesothelial keratin K7 (type II); B (from 
top)f glutathione S'transferase-Tr and mesothellai keratin IC7 (type 10; 
F(from fop and /eft), adenylyl cyclase-assodated protein. E-cadherin, 
keratin 19, calglzzailn. phosphoglycerate mutase, annexln IV. cy- 
loskeletai y-actln. hnRNP A1. Integral membrane protein calnexin 
(IP90), hnRNP H. brain-type dathrin light chain-a, hnRNP F. 70-kDa 
heat shock protein, heterogeneous nuclear ribonucleoprotein A/B, 
translatk>nally controlled tumor protein, liver glyceraldehyde-3-phos- 
phate dehydrogenase, keratin 8, aldehyde reductase, and Na,K- 
ATPase ^-1 subunit; G, {from fop and te^, TCP20, caJgizzarin, 70- 
kDa heal shock ^tein, calnexin, hnRNI* H, cytoker^Urt 15, ATP 
synthase, keratin 19, triosephosphate Isomerase, hnRNP F, liver glyc- 
erald^yde-S-phosphatase dehydrogenase, glutathione S-transfer- 
ase-w, and keratin 8; H (finom left), plasma gelsdiln. autoanligen cal- 
retk^utin, thioredoxtn, and NAD+ -dependent 15 hydroxyprostaglandin 
dehydrogenase; / (from fop), prolyl 4-hydroxylase p-subunlt, cyto- 
k€nratin 20, cytokeratin 17, prohibition, and fructose 1,6-blphos- 
phatase; J annexln 11; K, annexin IV; L (from top and feft). 90-kDa heat 
shock protein, prolyl 44iydroxyias6 ^^bunit, o^enotese, GRP 78, 
cydophilin, and cofiiin. 

gradient, and having a knowri chromosomal location, were 
selected for analysis in the TCC pair 827/532. Proteins were 
Identified by a combination of methods (see "Experimental 
Procedures'^. In general there was a highly significant conre- 
tatlon (p < 0.005) between mRNA and protein alterations (Fig. 
4). Only one gene showed disagreement between transcript 
alteration and protein sdteration. Except for a group of cyto- 
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Fi& 5. Comparison of protein and transcript levels in bwastve 
and non-Invasive TCCs, The upper part of the figure shdws a 2D get 
ifeft) and the oligonudeotide anray ifight^ of TCC 532. The red mctan- 
gies on the upper gef highlight the areas that are compared below, 
identical areas of 2D gels of TCCs 532 and 827 are shown below. 
Ctearty, cylokeratlns 13 and 15 are strongly downs«gulated In TCC 
827 (fed annotationy Ttie tile on the array containing probes for 
cytokeratin 15 is enlarged below the array (red arrow) from TOO 532 
and is compared with TCC 827. The t/pperrow of squares In each tile 
corresponds to perfect match prot>es; the lower row corresponds to 
nriisn^tch probes containing a mutation (used for correctton for un- 
spectflc binding). Absence of signal Is depicted as bls^k, and the 
higher the signal the fighter the color. A high transcript level was 
detected in TCC 532 (6151 units) whereas a much lower level was 
detected in TCC 827 (absence of signals). For cytokeratin 13. a high 
transciipt level was also present In TCC 632 (15659 units), and a 
much tower level was present In TCC 827 (623 units). The 2D gels at 
the bottom of the figure (fe^ show levels of PA-FABP and adipocyte- 
FABP in TCCs 335 and 733 (Invasive), respectively. Both proteins are 
down-regulated In the invasive tumor. To the right we show the array 
tiles for the PA-FABP transcnpt A medium transcript level was de- 
tected In the case of TCC 335 (1 277 units) whereas very tow levels 
weredatectddlnTCC7^(166untts).l&%isoelectrfcfoQUSlng. 



keratins encoded by genes on chromosome 17 (F=lg. 5) the 
analyzed proteins did not belong to a particular family. 26 well 
focused proteins whose genes had a know chromosomal 
location were detected In TCCs 733 and 335, and of these 19 
conreiated (p < 0.005) with mRNA changes detected u^ing 
the arrays (Rg. 4). For example, PA-FABP was highly ex- 
pressed In the non-invasive TCC 335 but lost in the Invasive 
counterpart (FCC 733; see Rg. 5). The smaller number of 
proteins detected In both 733 and 335 was because of the 
smaller size of the biopsies ^at were available. 

11 chromosomal regbns where CQH showed aben^tlons 
that con-esponded to the changes In transcript levels also 
showed con-esponding changes in the protein level (Table il). 
These regions Included genes that encode proteins that are 
found to be frequently altered in bladd^ cancer, namely 
cytokeratins 17 and 20, annexins II and IV. and the fatty 
acid-binding proteins PA-FABP and FBP1. Four of these pro- 
teins were encoded by genes In chromosome 17q, a fre- 
quently amplified chromosomal area In invasive bidder 
cancers. 

DISCUSSION 

Most human cancere have abnormal DMA content, having 
lost some chromosomal paits and gained others. The present 
study provides some evidence as to the effect of these gains 
and losses on gene expression in two pairs of noivinyaslye 
and invasive TCCs using high throughput expression arrays 
and proteomics, in combination with CQH. In general, the 
results showed that there is a clear individual regulation of the 
mRNA expression of single genes, which in some cases was 
superimposed by a DMA copy number effect In most cases, 
genes located In chromosomal areas with gains often exhib- 
ited increased mRNA expression, whereas areas showing 
losses showed either no change or a reduced fnRNA expres- 
sion. The latter might be because of the fact that losses most 
often are restricted to loss of one allele, and the cut-off point 
for detection of expression alterations was a 2-fold change, 
thus being at the border of detectk>n. In several cases, how- 



Table II 



Proteins whose expression fevef coneiates with both mRNA and gene dose changes 


Protein 


Chromosomat location 


Tumor TCC 


CGH alteration 


Transcript alteration" 


Protein alteration 


Annexin 11 


1q21 


733 


Gain 


Abs to Pros* 


Increase 


Annexln IV 


2p13 


733 




3.9-Fold up 


Increase 


Cytokeratin 17 


17q12-q21 


827 


Gain . 


3.8-Fold up 


Increase 


Cytokeratin 20 


17q21.1 


827 


Gain 


5.e-Foid up 


increase 


(PA-)FABP 


8q21.2 


827 


toss 


10-Fold down 


Decrease 


FBP1 


9q22 


827 


Gain 


2.3-Fotd up 


Increase 


Plasma gelsdin 


9q31 


827 


Gain 


/^toPres 


Increase 


Heat shock protein 28 


15q12-q13 


827 


Loss 


2.5-Fold up 


Decrease 


Prohibltin 


17q21 


827/733 


Gain 


3.7-/2.5-Fold up^ 


Increase 


Protyl-4-hydrexyl 


17q25 


827/733 


Gain 


6,7V1 .6-Fold up 


Incroase 


hnRNPBI 


7p15 


827 


Loss 


2.5-Fold down 


. becraase 



' Abs, alTsent; Pres. present. 

" In cases where the connesponding alterations were found in both TCCs 827 and 733 these are shown as 827/733. 
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ever, an increase or decrease in DMA cop/ niHmber was 
assodated with dSe novx? occurrence or complete loss of tran* 
script, respectively. Some of these transcripts could not be 
. detected In the non-invasive tumor t>ut were present at rela-* 
tlvely liigh levels In areas with DMA amplifications in the Inva- 
sive tumors (e.g. in TCC 733 transcript from cellular iigand of 
annexin il gene (chromosome 1q21) from absent to 2670 
arbitrary units; in TCC 827 transcript from smalt proline-rich 
protein 1 gene (chronvjsome 1q12-q21,1) from absent to 
1326 arbitrary units). It may be anticipated from these data 
that significant clustering of genes with an increased expres- 
sion to a certain chromosomal area Indicates an increased 
liketlhood of gain of chromosomal material in this area. 

Considering the many possible regulatory mechanisms act- 
ing at the level of transcription, it seems striking that the gene 
dose effects were so cteariy detectable in gained areas. One 
hypottiettcal explanation may lie in the loss of controlled 
methylation in tumor cells (17-19). Thus, It may be possible . 
that In chromosomes with increased DMA copy numbers two 
. or more alleles could be demethylated simultaneously leading 
to a higher transcription level, whereas In chromosomes with 
losses the remaining allele could be partly methylated, turning 
off the process (20, 21). A recent report has documented a 
ploidy regirtatlon of gene expression in yeast, but in this case ail 
the genes were present In the same ratio (22). a situation that Is 
not ana|pg<njs to that of <»ncer cells, whldi show marked 
diromosbmal at>ermtions, as well as gene dosage effects. 

Several CGH studies of bladder cancer have shown that 
some chromosonrial abenations are common at certain 
stages of disease progression, often occurring in more than 1 
of 3 tumors. In pTa tumors, these include 9p-, 9q-, 1q+, Y- 
(2, 6), andin-pTI tumors, 2q-,11p-. 11q-, 1q+. 5p-h, 8q+, 
17q+, and 20q+ (2-4, 6, 7), The pTa tumors studied here 
showed similar aben'ations such as gp- and 9q22'<|3d- and 
9q- and respectively. Likewise, the two minimal invasive 
pTI tumors showed aberrations that are commonly seen at 
that stage, and TCC 827 had a remarkat>le resemblance to the 
commonly seen pattern of^osses and gains, such as I qSgr?^ 
amplification (seen in both tumors); 11q14-q22 loss, the latter 
often linked.to 1 7 q+ (both tumors), and 1q+ and 9p', often 
linked to 20q+ and 11 q13+ (both tunws) (7-9). These ob- 
servations indicate that the pairs of tumors used In this study 
exhibit chromosomal changes observed in many tumors, and 
therefore the findings could be of general Importance for 
bladder cancer. 

Considering that the mapping resolution of CGH Is of about 
20 megabases It is only possit>le to get a crude picture of 
chromosomal instability using this technique. Occasionally, 
we observed reduced transcript levels close to or inside re- 
gtons with increased copy numbers. Analysis of these regions 
by positioning heterozygous microsatellites as dose as pos- 
sible to the locus showing reduced gene expression revealed 
k>ss of heterozygosity in several cases. It seems likely that 
multiple and different events occur along each chromosomal 



arm and that the. use of cDNA microanrays for analysis of DMA 
copy number changes will reach a resolution that can resolve 
these changes, as has recently been proposed (2). The outlier 
data were not more frequent at the boundaries of the CGH 
abenratlons. At present we do not know the mechanism tse- 
hind chromosomal aneuploidy and cannot predict whether 
chromosomal gains will be transcribed to a larger extent than 
the two native alleles. A mechanism as getnetic imprinting has 
an impact on the expression level in normal ceils and Is often 
reduced in tumors. However, the relation between Imprinting 
and gain of chronwsomal material is not known. 

We regard it as a strength of this investigation that we were 
able to compare invasive tumors to iDenign tumors rather than 
to normal urothelium, as the tumors studied were biologically 
very close, and probably may represent successive steps In 
the progression of bladder cancer. Despite tt\e limited amount 
of fresh tissue available it was possible to apply three different 
state of the art methods. The ot>served congelation between 
DMA copy number and mRNA expression is remarkable when 
one considers that different pieces of tine tumbr biopsies were 
used for tiie dfferent sets of experiments. This indicate that 
bladder tumors are relatively homogenous, a notion reoently 
supported by CGH and LX)H data tiiat showed a remari<able 
similarity even between tumors and distant metastasis (10, 23). 

In the few cases analyzed, mRNA. and protein levels 
showed a striking coaespondence although In some cases 
we found discrepancies that may be attributed to translational 
regulation, post-translatlonal processing, protein degrada- 
tion, or a combination of these. Some transcripts belong to 
underfcranslated mRi^ pools, which are associated with few 
translationally ir^ctlve ribosomes; these pools, however, 
seem to be rare (24). Protein degradation, for example, may 
be very Important , in the case of polypeptides with a short 
half-life {e.g. signaling proteins). A poor con-elation between 
mRNA and protein levels was found in liver ceils as deter- 
mined by an^ys and 2D-PAGE (25), and a moderate conrela- 
tioa vvas recently reported by Ideker et al. g6) In yeast 
X^teresttngly. our study revealed a riiuch better con'elatlon 
between gained chromosomal areas and increased mRMA 
levels than between loss of chromosomal areas and reduced 
mRNA levels. In general, the level of CGH change determined 
the ability to detect a change In trariscrip€) One possible 
explanation could be that by losing one allele tiie change In 
mRNA level is not so dramatic as compared with gain of 
material, which can be rather unlimited and may lead to a 
severalfold increase in gene copy number resulting In a much 
higher Impact on transcript level, the latter would be much 
easier to detect on the expression anrays as the cut-off point 
was pliaced at a 2-fold level so as not to be biased by noise on 
the anray. Construction of arrays with a better signal to noise 
ratio may In the future allow detection of lesser than 2-fold 
alterations in transcript levels, a feature that may facilitate the 
analysis of the effect of k>ss of chromosomal areas on tran* 
script levels. 
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; in eleven cases we found a significant correiation between 
ONA copy number, mRNA expression; and protein level. Four 
of these proteins vi^re encoded by genes located at a fre- 
quently amplified area In chroirosome ITq.. Whether DNA 
copy number is one of the mechanisms behind alteration of 
these eleven proteins is at present unknown and will have to 
be proved by other methods using a larger number of sam- 
ples. One factor making such studies complicated is the large 
extent of protein modification that occurs after translation, 
requiring Immunoidentification and/or mass spectrometry to 
correctly Identify the proteins In the gels. 

In conclusion^ the results presented In this study exemplify 
the large body of knowledge that may be possible to gather In 
the future by combining state of the art techniques that follow 
the pathway from DNA to protein (26), Here, we used a tradi- 
tional chromosomal CGH method, but in the future high reso- 
lution CGH based on microanays with many thousand radiation 
hybrid-mapped genes will Increase the resolution and infonma- 
tion defivecl from these typ^ of experiments {2). Comfc)ined with 
expression arrays analyzing transcripts derived from genes with 
known lo(^ons, and 2D gel analysis to obtain infonmation at 
the post-translationai levd, a dearer and more developed un- 
dwstanding of the tumor genome wili be forthcoming. 
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ABSTRACT 

Genetic changes underlie tumor progression and may lead to cancer- 
spediic eipresslon of critical genes. Over IIOQ publications have de- 
; scribed the use of comparative genomic hybridization (CGH) to analyze 
the pattern of copy number alterations in cancer, but very few of the genes 
afTected are known. Here, we performed high-resolutioh CGH analysis on 
cDNA microarrays in breast cancer and directly compared copy number 
and mRNA expression levels of 13^24 genes to quantitate the Impact of 
genomic changes on gene expression. We Identified and mapped the 
boundaries of 24 Independent ampUcons, ranging In size from 0.2 to 12 
Mb, Throughout the genome, both high- and low-level copy numb<kr 
changes had a substantial impact on gene expression, with 44% of the 
highly aroplUied genes showing overexpression and. 10.5% of the highly 
overezpressed genes being amplified. Statistical analysis with random 
permutation tests identified 270 genes whose expression levels across 14 
samples were systematically attributable to gene, amplification. T^ese 
Indoded most previously described amplified genes in breast cancer and 
many novel targets for genomic alterations, including the H0XB7 gene, 
the presence of which in a novel amplicon at 17q2U was validated In 
10 J% of primary breast cancers and associated with poor patient prog- 
nosis. In conclusion, CGH on cDNA microarrays revealed hundreds of 
novel genes whose overexpression Is attributable to gene amplification. 
These genes may provide Insights to the clonal evolution and progression 
of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microanrays have 
facilitated classification of cancers into biologically distinct catego* 
ries^ some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanisms underlying gene expression patterns in cancer have re- 
mained^, elusive, and the utility of gene expression profiling in the 
identification of specific tfiierapeutic targets remains limitedr^ ^ 

Accumulation of genetic defects is thought to underlie the clonal 
evohition of cancer. Identification of die genes that mediate the effects 
of genetic changes may be important by highlighting transcripts that 
are actively involved in tumor progression. Such transcripts and their 
encoded proteins would be ideal targets for anticancer then^ies, as 
demonstrated by the clinical success of new therapies against ampli- 
fied oncogenes, such as ERBB2 and EGFR (y, 8), in breast cancer and 
other solid tumors. Besides anq)lifications of known oncogenes, over 
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Fig. 1 . hx^act of gene copy qmnber on global gene-expression levels. A, percentage of 
over- and underexpres8ed.-;genes {Y axis) according to copy . Dumberbratios (X oxis). 
Threshold values used for over- and undere^resston were >2J84 (global upper 7% of 
the cDNA ratios) and <0.4826 (global lower 7% of the expression ratios). B, percentage 
of amplified and deleted genes according to expression ratios. Threshold vahiea for 
amplification and deletion were >1.5 and <0.7. ' ' 



20 recurrent regions of DNA amplification have been mapped in 
breast cancer by COtf (9, 10).. However, these amplicons are often 
large and poorly defined, and their impact on gene ^ression remains 
unknown. 

We hypothesized that genome-wide identification of those gene 
expression changes that are attributable to underlying gene copy 
number alterations would highlight transcripts that are. actively in- 
volved in the causation or maintenance of , the malignant phenotype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: (a) determine the global impact that gene copy 
number variation plays in breast cancer development and progression; 
and (b) identify and characterize those genes whose mRNA expnes- 



^ The abbreviations used are: CGH, eompantive geaomic hybrldizatioo; FISHf fluo- 
rescence in ^//ii hybridiaatioii; RT-IHCR, revetae tra^^ 
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Fig. 2. Genome-wide copy number and expression analysis in the MCP-7 breast cancer eel! line. A, chromosomal CGH analysis of MCF-7. The copy number ratio profile (blue 
line) across the entire genome from Ip telomere to Xq telomere is shown along with ± I SD {orange lines). The black horizontal line indicates a ratio ofl.O; red.line, a ratio of 0,8; 
and green line, a ratio of 1^. B-C^ genome-wide copy number analysis in MCF-7 by CGH on cDNA microarray. The copy number ratios were plotted as a function of the position 




. - , , - ^ — „ _ „ , ^ green aots, the next 5% of the expression ratios 

lumieiexpiessed genes); the rest of the observations are shown with black crosses. Ihe chromosome numiben aid shown at the bottom of Ac figure, and chromosome boundaries an 
indicated with a dSarAecf //w. : . 



sibn is most significantly associated with amplificatibn of the corre- 
sponding genomic^template. 

MATERIALS AND Mj^THODS 

Breast Cancer Gi^ll Lines. Fourteen breast cancer cell lines {BT-20, BT- 
474, HCC1428, Hs578t, MCF7, MbA-361, MDA^36, MDA-453, MDA-468, 
SKBR-3, T^7D, UACC8i2, ZR-75-1, and ZR.75-30) were obtained from the 
American Type Culture Collection (Manassas, VA). Cells were grown under 
recommended culture conditions. Genomic DNA and mRNA were isolated 
using standard protocols. 

Copy Number and Expression Analyses by cDNA Microarrays. The 
preparadon and printing of the 13,824 cDNA clones on glass slides were 
performed as described (1 1»13). Of these clones, 244 represented uncfaarac- 
terized expressed sequence tags»'and tiie remainder conesponded to known'- 
genes. CGH experiments on cDNA mjcroairays were done as described (14, 
15). Briefly, 20 ng of genomic DNA from breast cancer cell lines and normal 
human WBCs were digested for 14-^18 h with and Rsal (Life Technol- 
ogies, Inc., Rockville, MD) and purified by phenol/chloroform extraction. Six 
lig of digested cell line D>IAs were labeled with Cy3-dinrP (Amersham 
Pharmacia) and normal DNA with CyS-dUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, inc.). Hybridization (14, 15) and. 
posthybridizatidn washes (13) were done as. described. For the expressioii 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, 
La Jolla, CA) was used in all experiments. Forty ^g of reference RNA were 
labeled .with CyS-dUTT and 3.5 jxg of test.ihRNA with Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (1 3, 15). For both 
microarray analyses, a laser confocal scanner (Agilent Technologies^ Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DEARRAY software (16). After background subtraction, 
avenge intensities at each clone in the test hybridization were divided by the 
average intensity of the corresponding clone in the control hybridization. For 
the copy number analysis, the ratios were normalized on - die basis of the 
distribution of ratios of all targete on die array and for the expression analysis 
on the basis of ^8 housekeeping genes, which were spotted four times onto the 
anay. Low quality measurements (i.e., copy number data with reference 
intensity <100 fluorescent units, and expression data with both test .aiid 
r intensity <100 fluorescent muts and/or. with jspot size <50 units) 



were excluded fix>m the analysis and were treated as missing values. The 
distributions of fluorescence ratios were used.to defme outpoints for increased/ 
decreased copy number.. (jenes with CGH ratio >1.43 (representing the iq)per 
5% of the CGH ratios across all experimexits) were considered to be afnplifled, 
and genes witfi ratio <p:73 (representing the lower 5%) were considered to be 
deleted. 

Statisfical Analysis of CGH and cDNA Microarray DsitSL Jo evaluate 
the influence of copy number alterations on gene expression, we. applied the 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized using median centering of the values in each 
cell line. Furthermore, cDNA ratios for each gene across all 14 cell lines were 
median centered. For each gene, the CGH data were represented by a vector 
that was labeled 1 for ampliflcation (ratio, >1.43) and 0 for no amplification. 
Ampliflcation was correlated with gene expression using the signaUo-noise 
statistics (1). We calculated a weight, Wg, for each gene as^ollpws: 

m,i-m^ 



where m^i, o-^, and /w^ <r^ denote the means and SDs for the expression 
levels for amplified and nonamplifled cell lines, respectively. To assess the 
statistical significance of each weight^ we performed 10,000 random penitu- 
tations of the label vector. The probability that a gene had a larger or equal 
weight by random permutation than the original weight was . denoted by a. A 
■low a (<0.0S) indicates a strong association between gene expression and 
amplification.' 

Genomic Localization, of cDNA Clones and Ampllcon Mapping. Each 
cDNA clone on the microarray wais assigned to a Unigene cluster using die 
Unigene Build 14.1,* A database of genomic sequence aligimient information 
for mRNA sequences was created from the August 2001 freeze of the Uni- 
versity of California Santa Cruz*s GoldenPatii database.'^ The chromosome and 
bp positions for each cDNA clone were then retrieved by relating tiiese data 
sets. Amplicons were defined as a CGH copy number ratio >^*0 in.at least two 
adjacent clones in two or more cell lines or a CGH ratio >2iO in at least three 
adjac^t clones in a single cell line. The an^licon start and end positions were ■ 



* Internet address: ht^V/reseaicfajUigrijuh.gov/imicroanty/downloadablejDdnaJi^ 
^ Internet address: www.genome.uc8C.edu. 
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Table 1 Summary of independent ampUcons in J 4 breast cancer cell Unes by 

^ CGH mfcroarray 

Location Start (Mb) End (Mb) Size (Mb) 

33 
03 
2.7 
53 
5.2 
0.7 
6.0 
4.6 
123 
1.0 
0.6 
42 
0.9 
1.6 
3.0 
33 
5.9 
5.1 
0.8 
13 
1.6 
3.0 
7.8 



extended to include neighboring nonamplified clones (ratio, <1.5). The am-, 
plicon size determination was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell lines Wais done' as 
described (17). Bacterial artificial chromosome clone RPll-36iK8 was la- 
beled with SpectrumOrange (Vysis, Downers Grove, IL), and Spectnmi- 
Orangerlabeled probe for EGFR was obtained from Vysis. SpectrumGreen- 
labeled chromosome 7. and 17 centromere probes (Vysis) were used as a 
reference. A tissue microarray containing 612 fdrmalin-fixed, paraifin-enibed- 
ded primary breast cancers (17) was applied is FISH analyses as described 
, (18). The use of tfae^ specimens was approved by the Ethics Committee of the 
University of Basel and by the NIH. Specimens containing a 2-fold or higher 
increase in the number of test probe signals, as con^ared with corresponding 
centromere, signals, in at least 10% of the tumor cells were considered to be 
amplified Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test. 

KT-PCR. The H0XB7 expression level was determined relative to 
GAPDH. Reverse transcription and PGR amplification were performed using 
Acpess RT-PCR System (Promega Corp., Madison, Wl) with 10 ng of mRNA 
as a template. HOXB7 primeis were 5'-OAGCAGAGCiGA(rrC(X5ACTr-3' 
and 5'-(3CGTCAdaTAGCGATrOTAO-3'. 

RESULTS 

Global Effect of Copy Number on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH microarrays) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gebe expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (Le., belonged to 
the global upper 7% of expression ratios), compared with only 6% for 
geiies with normal copy number leyels (Fig. 1 A). Conversely, 10.5% 
of the transcripts with high-level expression (cDNA ratio, >10) 
showed increased copy number (Fijg. IB). Low-level copy number 
increases and decreases were also associated, with similar, although 
less dramatic, outcomes on gene expression (Fig.- 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pair 
locations obtained for 1 1,994 cDNAs (86.8%) were used to plot copy 
number changes as a function of genomic position (Fig. 2, Supple- . 
roent Fig. A). The average spacing of clones throughout the genome 
was 2d7 kb;. This high-resbiution mapping identified 24 independent 
breast cancer amplicons, spanning from 0.2 to-12 Mb of DNA (Table 
1). Seyeral amplification sites detected previously by chrbmosomai 



CGH were validate with lq21, 17ql2-q21.2, 17q22^q23, 20ql3.1, 
and 20ql3.2 regions being most conunonly amplified. Furthermore, . 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons were identified at.9pl3 (38.65-39,25 Mb), 
and 17q21.3 (52.47-55,80 Mb). 

Direct Identification of Putative Ampiificaition target Genes. 
The cDNA/CGH microarray technique enables the direct correla- 
tion of copy number and expression data on a gene-by-gene basis 
throughout the genome. We directly annotated high-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified gene$ in the MCF-7 breast cancer 
cell line at lpl3, 17q22-q23, and 20ql3 were highly overex- 

. pressed. A view of chromosome 7 in the ^IDA-468 cell line 
implicates EGFR as the most highly overexpressed and amplified 
gene at 7pll-pl2 (Fig..3i4). In BT-474, tiie two known amplicons 
at 17ql2 and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 3S). In addition, several genes, including the 
homeobox genes H0XB2 and HdXB7, were highly amplified in a 

. previously undescribed independent amplicon. at 17q2i. 3. HOXB7 
was systematically amplified (as validated by. FISH, Fig. inset) 
as well as overeTCpressed (asi verified by RT-PCR, data not shown) 
in BT-474, UACC812, and ZR-75-30 cells. Furthermore, this novel. 




Basopt6m. «joffv- 

Fig. 3. Aimotatiofi of gen&expression data on CGH microanBy prDfiles. A, genes in the 
7pl l>pl2 anipUoon in the MDA<468 cell line are highly expressed {r&l dots) and include 
. the EGFR oncogene. sevexal genes in the I7ql2,, 17q21.3, and 17423 snqtUcons m the 
BT-474 breast cancer cell line are highly overexpressed {red) and inchide the H0XB7 
gene. The data labels and color coding are as indicated for Fig. 2C btsefs show 
chromosomal CGH profiles for the corresponding chrompsomes and validation of the 
increased copy number by interphase FISH using EGFR {rtdj and chroniosoaie 7 
centromere pr6be (green) to MDA-468.(/l) and /r&KB7-8pecific' probe (red) and dso- 
mosome 17 centr oi nci e (green) to BT-474 cells {JS), 
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OfiMi EXFRESStON FATTERKS IN BREAST CANCER. 
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significant cctreUUoa {« vatoe <0.05) between 
copy number and goie expccsslon. Nome, 
chromosomal locanon, and the a value for each 
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list Of 270 genes Is shown b snpptcrocntat Fig. B. 
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amplification was validated to be present in 10.2% of 363 primary 
breast caticers by FISH to a tissue microarray and was associated 
with poor prognosis of the patients \P = o!001). 

Statistical Identification and Characterization of 270 Highly 
Expressed Genes in Amplicons. Statistical comparison of expres- 
sion levels of all genes as a fimction of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across all 14 cell lines (Fig. 4, Supplemental Fig, B). Accord- 
ing to the gene ontology data.« 91 of the 270 genes represented 
liypothetical proteins or genes with no ftinctional annotation, whereas 
I79.had associated functional information available. Of these. 151 
(84%) are implicated in apoptosis, cell proliferaUon. signal transduc- 
tion, and transcription, whereas 28 (16%) had functional atinotations 
that could not be directly linked with cancer. . 



* fntemei addt«M: httpyAvww.geneontdlosy.<u^. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumors has been 
characterized in >1000 publicaUons applying CGlf' (9. 10), as well 
as in a large number of other molecular cytogenetic, cytogenetic, and 
molecular genetic «tudies. The effects of these somatic genetic* 
changes on gene expression levels have remained largely unknown, 
although a few studies have explored gene expiession changes occur* 
ring in specific- amplicons (15. I9~21). Here, we applied genome- 
wide cDNA mjcroarrays to Identify transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer 

The overal I impact of copy number on gene expression patterns was 
substantial with the most dramatic eflfccta seen In the case of high- 
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OEKB EXPRESSiON PATTERNS IN . BREAST CANCER 



level copy number increase. I^w-level copy number gains and 
also had a signilicaht influence on expression levels of genes in the 
regions affected, but these effects were more subtle on a gene-by-gene 
basis than those of high-level amplifications. However, the impact of 
low-level gains on the dysregulation of gene expression patterns in 
cancer may be equally Important if not more in^rtant than that of 
high-level amplifications. Aneuploidy and low-level gains and losses 
of chromosomal arms represent the most common types of genetic 
alterations in breast and other cancers and, therefore, have an influ- 
ence on many genes. Our results in breast cancer extend the recent 
studies on the. impact of aneuploidy on global gene expression pat- 
terns in yeast cells, acute myeloid leukemia, and a prostate cancer 
model system (22-24)!^ 

The CGH microarray analysis identified 24 independent breast 
cancer amplicons. We defined the precise boundaries for many am- 
. plicons detected previously by chromosomal CGH (9. 1 0, 25, 26) and 
also discovered novel amplicons that had not been detected previ- 
ously, presumably because of their small size (only 1-2 Mb) or close 
proximity to other larger amplicons. One of these novel amplicons 
involved tiie homeobox gene region at 17q21.3 and led to the over- 
expression of the HOXB7 and HOXB2 genes. The homeodomain 
transcription factors are' Icnown to be key regulators of embryonic 
development and have been occasionally reported to undergo aberrant 
expression in cancer (27, 28). H0XB7 transfection induced cell pro- 
liferation in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32). the pres- 
ent results imply that gene amplification may be a pinominent mech- 
anism for ovcrexpressing H0XB7 in breast cancer and suggest that 
/TOW/ contributes to tumor progression and confers an aggressive 
disease phenotype in breast cancer. This view is supported by our 
finding of amplification of H0XB7 in 10% of 363 primary breast 
cancers, as well as an association of amplification with poor prognosis 
of the patients. 

We carried out a systematic search to identify genes whose 
. expression levels across all 14 cell lines were attributable to 
amplification status.. Statistical analysis revealed 270 such genes, 
(representing —2% of all genes on the array), including not only 
previously described amplified genes, such as HER-l, MYC, 
EGFR, ribosomal protein s6 kinase, and AIB3, but also numerous 
novel genes such as NRAS'-related gene (lpl3), syndecan'2 {Sq22), 
and bone morphogentc protein (20ql3.1), whose activation by 
amplificatios) njay similarly promote breast cancer progression. 
Most of the 270 gen^ have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is. intogUing that 84% of the genes with associated 
functional information were implicated in apoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directiy imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summary, we. demonstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels^of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once every 267 kb. This analysis provided: (a) evidence of a 
I prominent global infiuence of copy nuinber changes on gene 
expression levels; (b) a high-resolution map of 24 independent 
amplicons in breast cancer; and (c) identification of a set of 270 
genes, the overexpression of which was statistically attributable to 
gene amplification: Characterization of a novel amplicon at 
17q21.3 implicated amplification and oyerexpression . of the 
HOXB7 gene in breast cancer, including a clinical association 



between H0XB7 amplification and poor patient prognosis. Overall, 
our results illustrate how the identification of genes activated by • 
gene amplificatioi^ provides a powerful approach to highlight 
genes witii an important role in cancer as. well as to prioritize and 
validate putative targets for therapy development 
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Genomic DNA copy number alterations are key genetic events }n 
the development and progression of human cancers. Here we 
report a genome-wtde mlcroan^ay comparative genomic hybrid- 
ization (array CQH) analysis of DNA copy number variation in 
a series of primary human breast tumors. We have profiled DNA 
co|)y number alteration across 6«691 md|H>ed human genes, in 44 
predomlnantiy advanced* primary breast tumors and 10 breast 
cancer ceil lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high- 
resolution (gene-by-gene) mapping of ampllcon boundaries and 
the quantitative artalysis of amplicon shape provide significant 
improvement in the localization of candidate oncogenes. Parallel 
microarray measurements of mRNA levels reveal the remarlcable 
degree to which variation in gene copy number contributes to 
variation in gene expression In tumor cells. Specifically, we find 
that 62% of highly amplified genes show moderately or highly 
elevated e;q3!resjHon« that DMA copy. number inflMences gene ex- 
. presslon across a wMe range of ONA copy number alterations 
(deletion, lovv-« mid- and high*levei amplification), that on average, 
a 2-fold change In DNA copy number is assodated with a corre- 
sponding 1.5-f6ld change in mRNA levels, and that overall, at least 
12% of ait the variation in gene expression among the breast 
tumors Is directiy attributable to underiying variation In gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer. 

Conventional cytogenetic techniques, tndudtng comparative 
genomic hybridization (CGH) have led to the identifi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast canteer cell^lines and tumors (2-4). While 
some of these regions contain known or candidate oncogenes 
{e.g., FOFRt (8pll), MYC (8q24), CX^NDl (Uql3), ERBB2 
(17ql2), and ZNF217 (20ql3)] and tumor suppressor genes 
(RBI (13ql4) and TP53 (17pl3)], tiie relevant gcne(s) within 
other regions (e.g., gain of Iq, 8q22, aiid 17q22-24, and loss of 
8p) remain to be identified. A high-resoiution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
ations in tumors, should fadlitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer. In this study, we have created such a map, using 
array-based CX3H (5-7) to profile DNA copy number alteration 
in a series of breast cancer cell lines and primary tumoirs. 

An unresolved question is the extent to which the widespread 
DNA copy number changes that we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same samples (8)» using the same DNA microarrays, we had 
an opportuni^ to explore on a genomic scale the relationship 
between DNA copy number changes and gene expression. From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptloiiaJ 
programs of breast tumors. 

Materials and Methods 

Tumors and Cell lines. Primary breast tumors were predominantly 
large (>3 cm), intermedjate-grade, infiltrating ductal carcino- 
nias; with more than 50^% being lymph node positive. The 
fraction of tumor cells within specunens averaged at least 50%. 
Details of individual tumors have been published (8, 9), and 
are sunmiarized in Table 1, which is published as supporting 
information on the PNAS web site, www.pnas.org. Breast cancer 
cell lines were obtained from the American Type Culture 
Collection. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/chloroform extraction 
followed by ethanol precipitation. 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack ei aL (7)« with $li^t modifications. Two micrograms 
of DNA was labeled in a total volume of 50 microliters and the 
volumes of all reagems were adjusted accordingly. *Tcst" DNA 
(from tumors and cell lines) was f luorescently labeled (CyS) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (i.e., UniGene clusters). The 
'^reference** (labeled with ^3) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (8). 

Data Anatysf s and Map f^osHlons. Hybridized arrays were scanned 
on a CjenePbc^nner (Axon Instruments, Foster Qty, CA), and 
f luordscence ratios (test/reference) calculated using $canaly2x 
software (available at ht^://rana.Ibl.gov). Fluorescence ratios 
were normalized for each array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% above back- 
ground were considered reliable. DNA copy number profiles 
. that deviated signiticantiy from background ratios measured in 
normal genomic DNA control hybridizations w^einteipreted as 
evidence of real DNA copy number alteration (see Estimating 
Significance of Ahered Fluorescence Ratios in tiie supporting 
information). When indicated, DNA copy number profiles are 
displayed as a moving average (symmetric 5-nearest neighbors). 
Map positk>ns for arrayed human cDNAs were assigned by 
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Hg, 1. GenomeHwfdeme^urementof ONAcopy numberalteratlonbyarrayCGH. (a) DMA copy number profiles are tllustrated for cell lines containing different 
numbers of X chromosomes;, for breast cancer cell lines, and for breast tumors, Cach row represents a different cell line or tumor« and each column represents 
oneof 6,691 different mapped human genes present on the mtcroarray, ordered tiy genome map position from Ipterthrough Xqter, Moving average (symmetric 
5-nearest neighbors) fluorescence ratios (test/reference) are depleted using a logrbased pseudocolor scale Cndicated), such that red luminescence reflects 
fold-amplification, green himtnescence reflects fold-deletion, and blade Indicates no change (gray indicates poorly measured data). <b) Enlarged view of DNA 
copy number profiles aaoss the X chromosome, shown for cell lines containing different numbers of X chromosomes. 



identi^ng the starting position of the best and longest match of 
any DNA sequence represented in the correspondmg UniGenc 
cluster (10) against the "Golden Path*' genome assembly 
(http.7/genome.ucsc.edu/; Oct 7, 2000 Freeze)* For UniGene 
clusters represented mujlipj^arrj^ed elements, mean f tuo^ 
resoence-mtios (for ail elemeiils represeotmg the same UnlGene 
duster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-catered" (le^ reported relative to the mean 
ratio across the 44 tumor samples). The data set described here 
can be accessed in its entirety tn the supporting information^ 

Results 

We performed CGH on 44 predommantl^ locally advanced, 
primary breast tumors and 10 breast cancer cell lines, using 
cDNA micrQarrays containing 6,691 different mapped human 
genes (Fig. la; also see Materials and Methods for details of 
microarray l^bridizations). To take foil advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the **Golden 
Path" (http;//genome.ucscedu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themselves represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 



deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chipmosomes (Fig. 16), as we did before 
(7), demonstrated the sensitivity of our method to detect single- 
copy loss (45, XO), and 1^ (47,XXX), 2- (48,XXXX), or 
2^-fok) (49POOCX3Q gdns (also see 1% S, ^ich is published 
as supporting information on the PNAS web site). Fluorescence ' 
ratios were linear!^ proportional to copy number ratios, which 
were slightly underestimated, hi agreement with previous ob- 
servations (7). Numerous DNA copy number alterations were 
evident in both the breast cancer ceil lines and primary tumors 
(Fig. la), detected in the tumors despite the presfence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally lower in the tumor samples. DNA copy-number 
alterations were found in every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy number gain and loss were readily iden- 
tifiable. For example, gains within Iq, 8q, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%, 100%/47%, 100%/60%, and 90%/44%, respective- 
^), as were losses within Ip, 3p, 8p, and j3q (80%/24%, 
80%/22%, 80%/22%, and 70%/18%, respectively), consistent 
with published cytogenetic studies (rcfe. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, vrtiich are published 
as supporting mformation on the PNAS web site). The total 
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Fig. 2, DMA copy number aftcration across chrpmosome 8 by array CGH. (a) DN A copy number profiles are Illustrated for cell lines containing different numbers 
of X chromosomes, for breast cancer cell lines, and for breast tumors. Breast cancer cell lines and tumors are separately ordered by hierarchical clustering to 
highlight recurrent copy numt>er changes. The 241 genes present on the mlcroarrays and mapping to chromosome 8 are ordered by position ^long the 
chromosome. Fluorescence ratios (test/reference) are depicted by a logt pseudocolor scale Ondlcated), Selected genes are Indicated with color-coded text (red. 
Increased; greea decreased; black, no change: gray, not well measured) to reflect correspondingly altered mRNA levels (observed in the majority of the subset 
of samples displaying the ONA copy number change). The map posfttons for genes of Interest that are not represented on the mioroarray are indicated in the 
row above those genes represented on the array, (b) Graphical display of DNA copy number profile for breast cancer ceil line SKBR3. Fluorescence ratios 
^mof/normal) are plotted on a log} SHd^^JgM'd^^moso^ ^ ' 



number of genomic alterations (gains and losses) was found to 
be significantly higher in breast tumors that were high grade (P - 
0.008), consistent with published CX3H data (3), estrogen recep- 
tor negative {P « 0,04), and harboring TP53 mtitations (P - 
0.0006) (see Table 4, which is published supporting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array CGH analysis is 
illustrated for chromosome 8, which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
each of these potentially harbors a different known or previously 
uncharacterized oncogene (Fig. 2a), The complexity of amplicdn 
structure is most easily appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q m SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled 1-3 in Fig. 26). For each of these regions we can define the 



boundaries of the interval recurrently ampliEed in the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can be identified (a description of these regions, as well as the 
recurrently amplified regions on chromosomes 17 and % can be 
found in Figs. 6 and 7, vtrhich are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respecth^ely), and a subset of arrayed genes (6,095), mRNA 
levels were quantitatively measured m parallel oy using cDNA 
mtoroarrays (8). The parallel assessment of mRNA levels is 
useful m the mterpretation of DNA copy number changes. For 
example, the highly amplified genes that are also highly ex- 
pressed are the strongest candidate oncogenes withki an ampli- 
con. Perhaps more significantly, our parallel analysis of DNA 
co{^ number changes and mRNA levels provides us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong influence of DNA co|^ number on gene expression 
is evident in an examhiation of the pseudocolor r^resentations 
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Fig. 3. Concordance between DNA copy number and gene expression across chromosome 17. DNA copy number alteration (Upper) and mRNA levels {JLower) 
are iUustrated -for breast cancer cell lines and tumors. Breast cancer cell lines and tumors are separately ordered by hierarchical clustering (Upper), and the 
identical sample order is maintained {Lowed. The 3S4 genes present on the mlcroarrays and mapping to chromosome 1 7, and for which both DNA copy number 
and mRNA levels were determined, are ordered by position along the chromosome; selected genes are Indicated in color-coded text (see Fig. 2 legend). 
Fluorescence ratios (tes^reference) are depicted fay separate iogz pseudocolor scales (indicated). 



of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overajl patterns of gene amplification and 
elevated gene expression are quite concordant; ic^ a significant 
fracti^ of hig^ amplified genes apj^^^ to be correspondingly 
highly expressed. The concordance between high-level amplifi- 
cation and inaeased gene expression is not restricted to diro- 
mosomc 17. Genome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4,' and representing 91 different 
genes), 62% (representing 54 different genes; see Table 5, which 
is published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean-centered fluorescence 
ratios >4). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations In mRNA levels, 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. 4a). For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked with DNA copy number across all five classes, in a 
statistically significant feshion (/' values for pair-wise Student's 
rtests comparhig adjacent classes: cell lines, 4 x 10""^, 1 X 10"*^ ^ 
5 X 10^^ 1 X 10-2; tumors, 1 X 10-^, 1 -X IQ-^i* 5 X IQ-^i, 
1 X ld~'*). A Ihiear regression of the average log(DNA copy 
number), for each class, against average log(mRNA level) 
demonstrated that on average, a 2-fold diange in DNA copy 
number was accompan led by 1,4- and 1 5-fpld changes in mRNA 
level for the breast cancer cell lines and tumors, respectively (Fig. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations between DNA copy num- 
ber and mRNA level, each across the 37 tumor samples (Fig. Ah). 
The distribution of correlations forms a normal-shaped curve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig. 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notably, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 46) comprise both amplified and deleted genes (data not 
shown). Third, we used a linear regression model to estimate the 
fraction of all variation measured in mRNA levels among the 37 
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Fig. 4. Cenome-wtde influence of ONA copy number aKeratfons on mRNA levels, (a) For breast cancer eel! lines (gray) and tumor samples (black), both 
mean-centered mRNA fluorescence ratio (iogi scale) quartnes {box plots Indicate 2Sth, SOth, and 75th percentile) and averages (diamonds;; V-value error bars 
indicate standard errors of the mean) are plotted for each of five dasses of genes, representing ONA deletion (tumor/normat ratio < 0,8), no change (0.8-1 .2X 
low- (1.2^2). medium- (2-4), and high-level amplification, P values for pair-wise Student's t tests, comparing averages between adjacent classes (moving 
Icfttorigh^, are4 x 10-^. t x to-<». S x lO^M x 10-> (cell tones), and 1 x lO-«», 1 x W""* 5 x lO-<\ 1 x I Q-* (tumor?). (6) Distribution of correlations between 
DNA copy number and mRNA levels, for 6.095 different human genes across 37 breasttumor samples, (c) Plot of observed versus expected correlation coefficients. 
The expected values were obtained by randomization of the sample labels in the DNA copy number data set The line of unity is indicated, (d) Percent variance 
in gene expression (among tumors) directly explained by variation in gene copy number. Percent variance explained, (black line) and fraction of data retained 
(gray line) are plotted for different fluoreiscence intensity /background (a rough surrogate for signal/noise) cutoff values. Fraction of data retained is relative 
to the 1.2 intensity/background cutoff. Details of the linear regression model used to esttniate the fraction of variation in gene expression attributable to 
underlying DNA copy number alteration can be found In the supporting information (see estimating the Fraction of Variation in Oene Bxpress/on Attributable 
to Underiying DNA Copy lumber Aiteratfonl 



tumors tbat could be attributed to underlying variation in DNA 
copy number. From this analysis, we estimate that, overall^ about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copfy number of the altered 
genes (Fig. 4d), We can reduce the effects of experimental 
measurement error on this estimate by^usmgNoaly tl^t fraction 
of the data most reliably measured (fluorescence bitensity/ 
background >3); using that data, our estimate of the percent 
variation in mRNA levels directly attributed to variation in gene 
copy number increases to 12% (Fig. 4^. This still undoubtedly 
represents a significant underestimate, as the observed variation 
in global gene expression is affected not only by true variation in 
the expression programs of the tumor cells themselves, but also 
by the variable presence of non-tumor cell types withhi dinical 
samples. 

DIscusdon 

This genome-wide, array CGH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
Usefulness of deiHning amplicon boundaries at high resolution 
(gene-by-gene), and quantitatively measuring amplicon shape, to 
assist m locatuig and identifying candidate oncogenes. By ana- 
lyzhig mRNA levels in parallel, we have also discovered that 
changes in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 



cell lines and ttunors. Although the DNA tnicroarrays used in our 
analysis may display a bias toward characterized and/or hi^y 
expressed genes, because we are examinfaig such a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we cire likely underestimating the 
contribution of DNA copy number changes to altered gene 
expression,%e believe our findings are likely to be generalizable 
(but would nevertheless stiU be remarkable if onfy applicable to 
this set of --6,100 genes). 

In budding yeast, aneuploidy has been shown to result in 
chromosonie-widc gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA copy number and gene expression in cancer cells. Iii 
agreement with our findings, Phillips <rf a/, (14) have shown that 
with the acquisition of tumorigenidty in an immortalized pros- 
tate epithelial cell line, new chromosomal gains and losses 
resulted in a statistically significant respective increase and 
decrease in the average expression level of involved genes. In 
contrast, Platzer et al^ (15) recently reported that in metastatic 
colon tumors only ---4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium. This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression* These contrasting 
findings may reflect methodological differences between the 
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studies. For example, the study of Platzer et at (15) may have 
systematical^ under-measured gene egression changes* In this 
regard it Is remarkable that only 14 transcripts of many thousand 
residing within unampHfied chromosomal regions were found to 
^diibit at least ^-fold altei>ed expression in metastatic colon 
cancer. Additionally, their reliance on lower-resolution chromo- 
somal CGH may have resulted in poorly delimiting the bound- 
aries of hig^i-complexity amplicons, effectively ovcrcalling re- 
gions with amplificadon. Alternatively, the contrasting findings 
for amplified genes may represent real biological differences 
between breast and metastatic colon tumors; resolution of this 
issue will requh^ further studies. 

Our finding that widespread DNA copy number alteration has 
a large, pervasive and direct effect on global gene expression 
patterns in breast cancer has several important implications. 
First, this finding supports a high degree of copy number- 
dependent gene expression in tumors. Second, it suggests that 
most genes are not subject to specific autoregulation or dosage 
compensation. Third, this finding cautions that elevated expres- 
sion of an amplified gene cannot alone be considered strong 
independent evidence of a candidate oncogene's role in tumor- 
igenesis. In our study, fully 62% of highly amplified genes 
demonstrated moderately or highly elevated expression. This 
highlights the importance of high-rcsolution'nwpping of ampH^ 
con boundaries and shape {to identify the "driving*' gene(s) 
within amplicons (16)], on a large number of samples. In addition 
to functional studies. Fourth, this finding suggests that analyzing 
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the genomic distributioD ol eaq>ressed genes, even within existing 
microarr^ gene expression data sets, may permit the inference 
of DNA copy number aberration* particulariy aneuploidy (where 
gene expression can be averaged across large chromosomal 
regions; see Fig. 3 and supporting information). Fifth, this 
fmding unplies that a substantial portion of the phenotypic 
uniqueness (and by extension, the heterogeneity in dhiical 
behavior) among patients* tumors may be traceable to imderiy- 
ing variation m DNA copy number. Sixth, this finding supports 
a possible role for widespread DNA copy number aitemtion in 
turaorigenesis,(17, 18), beyond the amplification of specific 
oncogenes. and deletion of specific tumor suf^ressor genes. 
Widespread DNA copy number alteration, and the concomitant 
widespread imbalance in gene expression, might disrupt critical 
stochiometric relationships hi cell metaboilism and physiology 
(eg., prpteosome, mitotic spindle), possibly promoting further 
chromosomal histabiUty and directly contributing to tumor 
development or progression* Finally, our findings suggest the 
possibility of cancer therapies that exploit specific or global . 
imbalances ui gene expression In cancer. 
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